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ABSTRACT 

This paper introduces the NK Echo State Network. The 
problem of learning in the NK Echo State Network is re¬ 
duced to the problem of optimizing a special form of a Spin 
Glass Problem known as an NK Landscape. No weight ad¬ 
justment is used; all learning is accomplished by spinning 
up (turning on) or spinning down (turning off) neurons in 
order to find a combination of neurons that work together 
to achieve the desired computation. For special types of 
NK Landscapes, an exact global solution can be obtained 
in polynomial time using dynamic programming. The NK 
Echo State Network is applied to a reinforcement learning 
problem requiring a recurrent network: balancing two poles 
on a cart given no velocity information. Empirical results 
shows that the NK Echo State Network learns very rapidly 
and yields very good generalization. 
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1. INTRODUCTION 

This paper introduces the NK Echo State Network. En¬ 
hancements are added to the Echo State Network such that 
the problem of learning is reduced to the problem of op¬ 
timizing an NK Landscape. The binary input to the NK 
Landscape turns on and off a set of N neurons in the en¬ 
hanced Echo State Network. 

The dominant learning paradigm for training neural net¬ 
works involves adjusting the weights that connect artificial 
neurons using back propagation jl4j. Similar learning meth¬ 
ods are also used by support vector machines: for classifi¬ 
cation problems, both methods involve optimizing vectors 
of weights which create and move interacting hyperplanes 
to produce decision surfaces that can be used to separate 
training examples into the appropriate classifications. 

“Reservoir computing” refers to a class of neural network 
learning models, of which Echo State Networks is one ex¬ 
ample. Reservoir computing networks use a reservoir of 
sparsely connected artificial neurons that have randomly 
generated weighted connections. The input neurons are con¬ 
nected to neurons in the reservoir, and the output neurons 
also are connected to neurons in the reservoir. The weights 
inside of the reservoir of neurons are not adjusted by learn¬ 
ing. Thus, reservoir computing has moved one step away 
from optimizing vectors of weights as a learning paradigm; 


its ability to learn depends on having a large reservoir of 
potentially useful neurons, and also, having a way to de¬ 
termine which neurons are useful. In reservoir computing, 
learning still involves adjusting the weights that connect ar¬ 
tificial neurons in the reservoir to the outputs [ll] jlO 13 . 

In the current paper, the problem of learning is recast as 
a “Spin Glass Problem” in which N neurons interact with K 
other neurons, and each neuron spins up (on) or down (off) 
in order to find a combination of neurons that work together 
to achieve the desired computation. The limitation to K 
interactions can make the optimization problem tractable. 

For the NK Echo State Network, there exists new highly 
efficient optimization methods for that make it possible to 
know in constant time exactly which neuron to turn on or off 
in order to obtain an improvement in performance T . For 
special classes of NK Echo State Networks, we can prove 
convergence to a combination of neurons that is guaranteed 
to be globally optimal; an exact solution to the problem of 
determining which neurons to use can be obtained in poly¬ 
nomial time using dynamic programming [22] . 

The class of fc-bounded pseudo-Boolean functions are de¬ 
fined such that 1) the problem representation (i.e., the do¬ 
main) is the set of binary strings, and 2) the output of the 
evaluation function (i.e., the co-domain) is the set of real 
numbers. Pseudo-Boolean problems for inputs of length N 
are fc-bounded when the evaluation function can be decom¬ 
posed into a linear combination of subfunctions, and each 
subfunction takes a subset of at most k bits as input, where 
k is a constant. NK Landscapes j8] 18] Spin Glass Problems 
[23] and MAX-kSAT [9] [12] are examples of well known 
fc-bounded pseudo-Boolean optimization problems. All of 
these problems use an evaluation function of the form: 

1 M 

/w = s E/>( x ) w 

i=1 

where X is the function domain and a; € A is a bit vector, 
and each subfunction /; is evaluated using a subset of k bits 
drawn from the bit vector x. When k is small, each function 
fi can be expressed as a lookup table. In MAX-kSAT, each 
subfunction is a clause in CNF form, where the size of the 
clause is less than or equal to k. Each clause is true or 
false (returning 0 or 1) and the evaluation function / sums 
over all M clauses. In an NK-Landscape, M — N and each 
subfunction fi uses bit Xi as well as K additional bits as 
input. Thus, for NK-Landscapes, k = K + 1 and the output 
of subfunction fi can be any real valued number. 


To construct an NK Echo State Network, we convert the 
problem of selecting which neurons to utilize into an NK- 
Landscape optimization problem. In the current paper, we 
will consider an enhanced Echo State Network with a single 
output. One of the enhancements is to use an ensemble of 
N outputs. A second enhancement is to add an additional 
layer of N neurons between the reservoir and the ensemble of 
outputs; we will refer to this layer of neurons as the probe 
filter. Each neuron in the probe filter acts as a probe, 
sampling the reservoir so as to create a different neural cir¬ 
cuit. Each of the N neurons in the ensemble of outputs is 
connected to k neurons in the probe filter. This use of 
the probe filter has some similarities to the use of filter 
neurons by Holzmann and Hauser in Echo State networks 
16]; they use the filter neurons to add an additional layer of 
weight optimization. However, we have the very different 
purpose for the filter neurons. 

The ensemble of N output neurons combined with the 
N neurons in the probe filter can be expressed as an NK 
Landscape optimization problem. Each of the neurons in the 
probe filter layer is a neural circuit; the NK-Landscape is 
agnostic about how the neural circuits are created and needs 
no information about the reservoir. Each of the N output 
neurons becomes a subfunction in the NK-Landscape, and is 
connected to k neurons (neural circuits) in the probe filter 
layer; a binary string of length N turns on and off the N 
neurons in the probe filter. 

At most 2 k N online learning samples are needed to con¬ 
vert a learning problem into an NK-landscape. The NK 
Landscape function is a lookup table which stores all of the 
2 k evaluations for each of the N outputs. The NK-landscape 
can then be optimized offline using highly efficient optimiza¬ 
tion methods. This optimization process selects the best 
subset of neural circuits from the probe filter layer to use 
for the assigned learning problem. 

The transformation of the neural network training prob¬ 
lem into an NK Landscape optimization problem represents 
a novel approach to learning. Given current interest in “Deep 
Learning” and more complex forms of neural networks, this 
approach to learning may have a wide range of applications. 

We illustrate this new learning method by applying it to 
the reinforcement learning problem of balancing two poles on 
a cart while providing only cart position, and the two pole 
angles as input. This means the NK Echo State Network 
must learn to compute velocity information. Our empirical 
results shows that the NK Echo State Network not only 
learns rapidly, the resulting networks also produce very good 
generalization. 

2. BACKGROUND: NEURAL NETWORKS 
AND RESERVOIRS 

In his 1987 book Neural Darwinism, G.M. Edelman 
advanced the idea that “group selection” acting on neurons 
could result in a computational form of learning. Stated 
concisely, a diverse set of inputs can be used to test and 
select for neural circuits that respond appropriately to those 
inputs. For example, Edelman conjectured cell death (as 
well as cell duplication and the development of cell axons 
and dendrites) might also be controlled by some form of 
functional selection. Reduced to its simplest form, this gives 
rise to a theory where learning might be achieved in a neural 
architecture by turning on and turning off neurons as part 


of an effort to identify “useful neural circuits.” 

In the evolutionary computation community, there is a 
long history of neuroevolution spanning 25 years. This work 
combines evolutionary optimization methods with neural 
networks and related machine learning methods such as sup¬ 
port vector machines and echo state machines. Most of these 
methods utilize some mixture of learning the architecture of 
the neural network (including the number of neurons to use 
and how they should be connected) as well as how to learn 
the weights in the networks. 

The dominant paradigm for training most neural networks 
is back-propagation, where learning almost exclusively fo¬ 
cuses on adjusting the weights in the neural network. Much 
of the work in the evolutionary computation community has 
focused on reinforcement learning applications. These are 
often control problems as well. In reinforcement learning 
applications the evaluation of the system is based on defin¬ 
ing the desired behavior of the system, because it is not pos¬ 
sible to directly know the correct actions to take in order 
to control the system. An additional problem is that rein¬ 
forcement learning problems are often time dependent and 
the input data to the neural network are expressed as a time 
series. This means that “recurrent neural networks” are of¬ 
ten required for reinforcement learning applications. Thus, 
reinforcement learning applications pose two problems for 
traditional gradient methods such as back propagation: 1) 
only the desired behavior of the system is specihed, not the 
specihc desired actions that the systems should display in 
response to a particular input, and 2) recurrent neural net¬ 
works are not easily trained using gradient methods such as 
back propagation. 

In the last 10 years another approach to training recur¬ 
rent neural networks for reinforcement learning has focused 
on “reservoir computing” methods, one example of which are 
“echo state networks.” A sparsely connected set of neurons 
with recurrent connections is created; the weights between 
this set of neurons are set randomly, typically with random 
values that are relatively small (e.g. less than ±1.0). This 
forms the reservoir. Inputs neurons are then connected to 
the neurons in the reservoir. Neurons in the reservoir are 
also connected to the output neurons. This idea is illustrated 
in Figure 1, where there is a single output. One of the mo¬ 
tivations for reservoir computing is the work of Schiller and 
Steil JT6] which shows that when gradient methods are used 
to training recurrent neural networks, most of the weight 
changes occur in the weights that connect to outputs, even 
if the methods are being used to change all of the weights 
in the network. Thus, given a large reservoir of neurons, 
some combinations of neurons should be more useful than 
others. Training methods that optimize the weights between 
the neurons in the reservoir and the outputs can, essentially, 
determine which neurons and groups of neurons are doing 
useful computations. In this sense, reservoir computing has 
connections back to Neural Darwinism and, more loosely, to 
neuroevolution. 

This reservoir computing model addresses the problem of 
dealing with time dependent inputs that require recurrent 
neural networks. But it does not directly address the rein¬ 
forcement learning problem: we still only know the desired 
behavior of the system, but not exactly what actions the sys¬ 
tem should display from one time step to the next. Thus, 
traditional error propagation methods such as Back Propa¬ 
gation cannot directly be used to train Echo State Networks 
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Figure 1: An Echo State Network with an input layer, a reservoir of neurons, and a single output. The 
recurrent neurons of the reservoir are sparsely connected with density a. There is a single output neuron 
in this case. All neurons in the reservoir are connected to the output neuron. There are no recurrent 
connections originating from the output neurons. 



for reinforcement applications. 

One could use temporal difference methods or Q-learning 
methods for reinforcement learning problems. However, 
Gomez, Schmidhumber and Miikkulainen [4] have shown 
that a wide range of methods based on these ideas do not 
scale up and do not work well on more difficult reinforce¬ 
ment learning problems. They used Q-learning with a Multi¬ 
layer Perceptron that mapped state-action pairs to Q-values. 
They also compared to methods such as Sarsa(A) with Case 
Based Function Approximators and Sarsa(A) with a Cere¬ 
bellar Model Articulation Controller 15 . They concluded 
these methods were less effective and less efficient for com¬ 
plex reinforcement learning tasks compared to neuroevolu¬ 
tion based methods such as NEAT, ESP and CoSyNE [4|. 

Methods such as NEAT [lfl emphasize neuron selection 
and refinement; thus, NEAT (and HYPERNEAT) exploit a 
kind of Neural Darwinism. Methods such as ESP 3] and 
CoSyNE [i] focus on weight optimization for a fixed archi¬ 
tecture. 

This current paper takes the work on reservoir computing 
one step further: we reconfigure the learning problem so that 
no weight adjustment at all is used during learning. Instead, 
learning is accomplish using only neuron selection. 

3. IMPLEMENTATION DETAILS 

In the current paper, we make two enhancements to the 
Echo State Network that enable the network to identify “use¬ 
ful neural circuits.” 

We will work with an Echo State Network with one out¬ 
put. The Echo State Network with one output has been 
enhanced by adding an ensemble of N outputs, all of which 
compute the same output. The second enhancement is to 
add a probe filter layer with N neurons. The N neurons in 
the probe filter layer in effect probe the reservoir to gen¬ 
erate N different neural circuits. Each neuron of the output 
layer is connected to k neurons in the probe filter layer. 
This is illustrated in Figure 2. 

In the Echo State Network used here, all the weights of 
the network are randomly created and kept fixed during its 
training. During the training, only the bit vector x £ is 
optimized; the vector x specifies which neurons in the probe 
filter layer are activated or not. If Xi = 1, the i-th neuron of 
the probe filter is turned on, i.e., its activation is used as 
input for neurons in the output layer connected to it. If Xi = 


0, the i-th neuron of the probe filter layer is turned off. The 
connectivity of the output layer to the probe filter layer 
must be sparse. Each neuron of the output layer receives 
inputs from the outputs of k neurons of the probe filter 
layer. Following the usual definition of an NK-Landscape, 
the i-th neuron of the probe filter layer is always connected 
to the i-th output neuron. In NK-Landscapes, each bit in 
each subfunction interacts with I< additional bits. 

We will use both the adjacent NK-Landscape model and 
the random NK-Landscape model in our experiments. In 
the adjacent NK-Landscape model, the i-th output neuron 
is connected to neurons i, i+ 1, ..., i + K of the probe filter 
layer. In the random NK-Landscape model, the i-th output 
neuron is connected to neuron i and K other random neu¬ 
rons of the last hidden layer. Solving the optimization prob¬ 
lem for the random NK-Landscape is NP-Hard. However, 
a polynomial time dynamic programming algorithm yields 
a global optimum for the adjacent NK-Landscape problem 



Our implementation will use two neuron selection mecha¬ 
nisms. 1) The function mask(i,j) indicates an architectural 
feature of the neural network: it returns the j th neuron in 
the probe filter layer that provides an input to the output 
neuron i. This level of detail usually is not explicitly de¬ 
noted in neural networks. But here it is an important part 
of the NK-Landscape design. 2) The vector x turns on and 
off neurons in the probe filter layer. 

At each iteration the i th output of the NK neural network 
is given by: 

k +1 \ 

y w ma3k(i j )t i S(mask(i,j)) x(mask(i,j)) I (2) 
i=i / 

where </>(.) is a sigmoidal shaped activation function. The 
function mask(i,j) is a look-up table that returned the in¬ 
dex of the j th neuron that connects to output i. Assume 
mask(i,j) = q. Thus, w q ,i is the weight between the q th 
neuron in the probe filter which is also one of I\ +1 neurons 
that connects to the i th output neuron. S(q) is the output of 
neuron q of the probe filter layer; finally, x(q) indicates if 
the q th neuron in the probe filter layer is currently turned 
on or off. If x(q) = 1 then * S(q) * x(q) = w Ql i * S(q). 

To see how this relates to standard neural networks, as¬ 
sume we defined an alternative neural network where all 
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Figure 2: An NK Echo State Network. The Echo State Network with one output has been enhanced by 
adding an ensemble of N outputs, all of which compute the same output. The second enhancement is to add 
the probe filter layer. Each neuron of the output layer is connected to k neurons in the probe filter layer. 
The neurons of the probe filter layer are turned on or off according to a binary vector x. If Xi = 1, the i th 
neuron is turned on; if Xi = 0, the i th neuron is turned off. All weights are randomly created and are kept 
fixed during the optimization of x. 


neurons in the probe filter are connected to every output 
neuron, but only K + 1 of the weights that connect to a spe¬ 
cific output neuron are non-zero. Thus a zero weight is the 
same as no connection. Let the alternative weight vector be 
denoted by w' . Note that the original network and the alter¬ 
native network yield identical computations. Again, assume 
that mask(i,j) = q. 

/k+i \ 

Vi = ^ w mask ( itj ) ti S(mask(i,j))x(mask(i,j)) j 

Finally, assume that all of the bits in vector x are set to 1 
so that all of the neurons in the probe filter layer are on: 
we then obtain a standard neural activation function: 

Vi = 

3.1 Mapping to an NK-Landscape 

In the neural network investigated here, every output neu¬ 
ron is evaluated independently. This is because we want the 
set of N outputs to operate as an ensemble. Because the 
weights of the network are fixed, the evaluation of the i th 
output neuron for a given problem depends only on the bi¬ 
nary vector x and on the function mask{i,j). 

We denote the evaluation of the i th output neuron as 
/i(x); we can assume function fi will automatically imple¬ 
ment mask(i,j) when passed an input of length N. It is 
also convenient to assume that fi can also take an input of 
length K + 1; thus /i(x) = /;(0101) if fi uses mask(i,j ) to 
extract the bit pattern 0101 from vector x. 

For example, assume N = 20 and K + 1 = 4. Also as¬ 
sume that the “on/off” pattern of the 4 neurons used by 
output neuron number 19 is given by 0101. Then /ig(x) = 
/ig(0101) evaluates the behavior of the network at output 
neuron 19 when then the 1st and 3rd neuron that feed into 
output 19 are turned off, and the 2nd and 4th neurons are 
turned on. A single number is stored representing the per¬ 


formance of output neuron 19 for using the first and third 
neuron. For an output neuron i, the performance is recorded 
for every possible “on/off” pattern over the K + 1 neurons 
to which it connects in the probe filter. The results in a 
look-up table of performance evaluations with 2 K+1 entries 
for each subfunction fi(x). 

This is repeated for each of the N output neurons. Thus, 
the total number of Echo State Network evaluations will 
always be exactly 2 K+1 N (unless there is some overlap in 
the computation of the subfunctions that would allow some 
of them to be evaluated in parallel). 

We optimize the function x in order to maximize /i(x) 
when summed across all of the N outputs of the Echo State 
Network. Thus, for the combined N outputs, the evaluation 
function is: 

/( x ) = ^5Z^( x ) ( 3 ) 

i= 1 

The evaluation function given by Eq.[3]is exactly the same 
evaluation function used in the NK landscapes. In this way, 
optimizing the enhanced Echo State Network is equivalent 
to optimizing an NK landscape. 

The algorithms for training the enhanced Echo State Net¬ 
work using NK landscape optimization methods are given 
in the following descriptions. 

Algorithm 1: 

i. Create an artificial neural network ANN( M) with at least 
one hidden layer. The last hidden layer is the probe 
filter. The matrix M = [mi .. . nijv] defines the con¬ 
nectivity between the neurons of the probe filter and 
the neurons in the output layer. 

The connections between the ensemble of outputs and 
the probe filter layer are determined by mask(i,j) 
which has exactly K + 1 ones (indexed by j) defined 
according to the NK neighborhood model (adjacent or 
random); the K + 1 neurons selected by mask(i,j) 
feed into output i. All the weights and thresholds of 
the ANN (M) are random. A vector x € is used to 
turn on or off the neurons in the probe filter layer. 













ii. Evaluate each output neuron of AAW(M) for all combi¬ 

nations of the elements of vector x used by the output 
neuron. Since each output is connected to K + 1 neu¬ 
rons, there are 2 K+1 combinations for each output. 
Thus, we need N2 K+1 total evaluations to evaluate 
all subfunctions /;(x). For example, if N = 6, K = 2 
and the adjacent neighborhood model is used, then the 
subfunction fi(x) uses mask(i,j) = [1, 1 , 1 , 0,0,0] T . 
Thus, the first output is evaluated for all eight combi¬ 
nations of x\ , X2 , X3 : 000, 001, 010, ..., 110, 111. 

iii. Save ANN(M). Also, save M and the individual per¬ 

formance evaluations /;(x) for all combinations of the 
elements of vector x used by the output neuron. 


Algorithm 2: 

i. Load M and all of the N2 K+1 subfunctions corresponding 

to /i(x). 

ii. Create an NK model with M and the subfunctions /;(x). 

iii. Optimize the NK model by optimizing the vector x. 

iv. Save the best decision vector x*. 


Algorithm 3: 

i. Load the artificial neural network ANN(M.) created in 

Algorithm 1 and the best decision vector x* found in 
Algorithm 2. 

ii. Evaluate ANN( M) with vector x*. The signal employed 

to reproduce the output of the mapped function is 
composed by a weighted combination of the outputs 
2 /;(u(t), x*, t), i = 1,..., N, of the neural network. 

4. THE EXPERIMENTS 

In order to test the methodology proposed in this work, 
the NK Echo State Network is applied to the double pole 
balancing problem without velocity information [21 . Two 
poles of different length are attacked to a cart that moves on 
a track of fixed length. Both poles are balanced by pushing 
the cart to the left or to the right. The force of the push is 
allowed to vary. In this problem, the inputs of the artificial 
neural network at step t are composed by scaled cart position 
and angles of the two poles at step t, i.e., the input vector 

u (t) = [x c (t)/xr x ,0i(t)/9? ax ,02(t)/er x ] T 

where x c (t) is the cart position, 9i(t) is the angle of the z-th 
pole, and x™ ax and 9™ ax are the maximum allowed values 
used to scale the inputs between -1 and +1. All neurons 
use the hyperbolic tangent function with outputs between - 
1 and +1 as the sigmoidal squashing function. Since velocity 
information is not given as input, the recurrent Echo State 
Network is needed to learn to estimate velocity. 

In Algorithm 1, each output of the network is indepen¬ 
dently evaluated. The action (in Newtons) applied to the 
cart at iteration t when evaluating the z-th output for solu¬ 
tion x is given by: 

action(t) = 10j/i(u(t), x, f) (4) 

The following objective function was introduced by Gruau, 
Whitley and Pyeatt 5] and has been regularly used for the 


problem of balancing two poles on a cart by a significant 
number of researchers [17|[2][7]. 

/ = 0.1/l+0.9/ sta 6ie (5) 

where t is the number of steps inside the success domain 
until a limit of 5000 steps and 


fi = t/t max (Typically tmax — 1000) (6) 


and 


fstable — 


0 , 
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EU-100(l-«(*)l + l«?e(i)l + l«l( 0 |-tf»l(i)l) ’ 


if t < 100 
otherwise, 

( 7 ) 


The track length is given by x c G [—2.4, 2.4] meters; be¬ 
yond this range the cart crashes into the ends of the track. 
The system must keep both poles within 9i G [—36, 36] de¬ 
grees of vertical. The function fi indicates that the cart and 
pole system has avoided a failed state (where a pole falls, or 
the cart crashes) for t time steps. However, for small val¬ 
ues of tma X (tmax = 1000 has been standard) a bang-bang 
control strategy might be learned that causes the system 
to become more unstable over time; even if the controller 
avoids failure for tmax time steps, the system will become 
more unstable and eventually fail when the system is run 
for more than tmax time steps. The second function fstable 
indicates the stability of the system during the last 100 time 
steps if t > 100. A higher value of fstabie means that the 
system is staying close to the idea state: close to the cen¬ 
ter of the track, with small pole angles close to vertical, 
and with low velocities which means the poles and the cart 
are not rapidly changing from one extreme state to another. 
The value 0.75 was established by tuning the original Cel¬ 
lular Encoding network, which was the first neural network 
to solve this optimization task in 1996; the use of the 0.75 
value is probably now meaningless, but it persists for histor¬ 
ical and comparative reasons. 

For the evaluation of / in Algorithm 1, the systems al¬ 
ways starts from the state a: c (0) = 02(0) = a; c (0) = 0i(O) = 
# 2 (0) = 0 and 0i (0) = 4.5 degrees. The parameters of the 
double pole system used here are [ 2 ]: mass of cart equal to 
1 kg, mass of pole 1 equal to 0.1 kg, mass of pole 2 equal 
to 0.01 kg, length of pole 1 equal to 1 m, length of pole 1 
equal to 0.1 m, coefficient of friction of the cart on the track 
equal to 0.0005, coefficient of friction of the poles equal to 
0.000002. The 4th order Runge-Kutta method with integra¬ 
tion step equal to 0.01 was used. 


4.1 Empirical Results 

We present the results for the NK Echo State Network pre¬ 
sented in Figure 2 with N = 20, i.e., the number of neurons 
in the output layer. The number of neurons in the probe 
filter layer was also N = 20. We tested seven values for 
the connectivity degree between the last hidden layer and 
the output layer: K = 2,3,4, 5, 6, 7, 8. Both the random 
neighborhood NK Landscape model was tested, as well as 
the adjacent neighborhood NK Landscape model. 

In reservoir computing, there are recurrent connections 
between neurons. The connectivity density (a) for each 
reservoir is 10%, i.e., each neuron in the reservoir has re¬ 
current connections to 10% of the neurons in the reservoir. 
All weights of the NK Echo State Network are fixed, being 



Table 1: Results for experiments with the Adjacent and Random NK Landscapes are shown for N=20. And 
Evaluation greater than 0.10 means the pole was always balanced; a higher evaluation means the state of 
the system was closer to the ideal state. Generalization is tested over 625 initial states (e.g. 625 represents 
perfect generalization). Best Single Output shows the performance over the 20 outputs in the ensemble. 
Best-of-100 shows the best generalization achieved out of the 100 runs. The symbol “s” indicates a significant 
difference between the generalization of the Best Single Output and the Ensemble. 




Evaluation 


Generalization Test 




Best Single Output 

Best Single 

Output 

Ensemble 

o 

<N 

II 

£ 

K 

mean ± std 

mean ± std 

Best-of-100 

mean ± std 

Best-of-100 

Adjacent 

2 

0.12 ±0.01 

241 ±128 

550 

229 ±160 

525 


3 

0.12 ±0.01 

231 ±134 

604 

304 ±154 (s) 

525 


4 

0.12 ±0.01 

244 ±137 

525 

321 ±151 (s) 

525 


5 

0.13 ±0.01 

235 ±142 

525 

377 ±126 (s) 

575 


6 

0.13 ±0.01 

227 ±133 

544 

339 ±144 (s) 

575 


7 

0.13 ±0.01 

253 ±141 

550 

362 ±137 (s) 

588 


8 

0.14 ±0.01 

236 ±144 

550 

389 ±110 (s) 

575 

Random 

2 

0.12 ±0.01 

243 ±151 

623 

232 ±152 

520 


3 

0.12 ±0.01 

245 ±131 

569 

298 ±156 (s) 

525 


4 

0.12 ±0.01 

278 ±142 

577 

323 ±145 (s) 

575 


5 

0.13 ±0.01 

252 ±145 

575 

318 ±161 (s) 

532 


6 

0.13 ±0.01 

269 ±142 

535 

348 ±149 (s) 

575 


7 

0.13 ±0.01 

220 ±145 

544 

369 ±141 (s) 

600 


8 

0.14 ±0.02 

204 ±148 

614 

377 ±146 (s) 

573 


randomly generated between [-0.6 , 0.6]. After the initial¬ 
ization, the recurrent weights in each reservoir are scaled 
with a spectral radius equal to 0.95. All neurons use the 
hyperbolic tangent function as the activation function. 

The number of runs is 100. In each run, Algorithm 1 is 
used to compute the individual fitness contributions /;(x). 
Thus, the NK Echo State Network is generated and each out¬ 
put yi is evaluated using Eq. [5] for all possible combinations 
of the elements of vector x used by the i-th output neuron. 
Then, Algorithm 2 is used to obtain the best solution x* for 
the evaluation given by Eq. [3] 

We can use various methods to optimize the NK land¬ 
scape; for the adjacent neighborhood landscape model, we 
can use we use dynamic programming to find the global op¬ 
timum in polynomial time. For the random neighborhood 
landscape model, we can use new fast Local Search algo¬ 
rithms 19 l] to search the radius r neighborhood; this 
algorithm is able to identify improve moves that are r steps 
ahead in constant time. When r > 1 many local minima are 
eliminated that exist in the Hamming distance 1 local search 
neighborhood. When N is small (e.g. N = 20) we can also 
use exhaustive enumeration to find the global optimum. 

Finally, Algorithm 3 is employed to evaluate the NK Echo 
State Network with the combined outputs acting as an en¬ 
semble. The output of the ensemble at step t is given by: 

N 

2/ensem6Je(u(f),X*,t) = ^ d; (x* )?/; (u(t), X* , t) (8) 

i=1 


where u(t) is the input vector of the neural network at step 
t, x* is the solution vector obtained by Algorithm 2, yi is 
the i th output of the neural network and the weight cq is 
given by 


2 i(x*) = 




EiLi /<(*•) 


(9) 


i.e., the outputs with better results have higher weights. 
This amounts to a simple form of learning where the weights 
are set once to combine the ensemble into a single output. 
The action to the cart at step t is: 


action(t ) = 10y ensernb i e (u(t), x*, t) (10) 


The ensemble is evaluated using the generalization test 
proposed in [20 . In this generalization test, the ensemble 
is evaluated 625 times, each time with different initial set¬ 
tings for cart position, cart velocity, pole 1 angle, and pole 
1 velocity. The initial pole 2 angle and velocity are always 
zero. The combination of five different initial settings for 
each variable is considered: 5, 25, 50, 75, and 95% of a re¬ 
duced range of the variables. The reduced range is between 
-2.14 and 2.14 m for cart position, -1.35 and 1.35 m/s for cart 
velocity, -3.6 and 3.6 degrees for pole 1 angle, -8.6 and 8.6 
degrees/s for pole 1 velocity. The evaluation of the general¬ 
ization test is the number of times that the system remains 
in the success domain after 1000 steps. 

Table [l] shows the results of experiments where the adja¬ 
cent and random NK models were considered. The reservoir 
utilizes 60 neurons. The results of the generalization test for 
yensemble (t) are given in the column indicated as “Ensemble”, 
while “Best Single Output” indicates the best single value of 
/i(x) among the N2 K+1 possible performance values ob¬ 
tained in Algorithm 1. The results of the generalization test 
shows that on average the ensemble generally yields better 
generalization than the best single output neuron. The col¬ 
umn marked “Best-of-100” shows the generalization of the 
best NI< Echo State Network out of the 100 that were gen¬ 
erated. 

From Table 1 we can see that Ensemble regularly produces 
generalization results that are significantly better than the 
generalization results associated with the “Best Single Out¬ 
put” neuron. The Wilcoxon signed rank test (at the 0.05 
significance level) was used to compare the generalization re¬ 
sults. We note that the generalization results for the “Best 
Single Output” were much better than we expected given 
that the only “learning” that was done was to pick the best 
of 2 k+1 configurations of neurons in the probe filter layer. 
TableQ]also shows there is little or no difference between us¬ 
ing the random NK Landscape model and the adjacent NK 
Landscape model. This means that the adjacent NK Land¬ 
scape model provides sufficient diversity in neural circuits 
while still being regular enough to be solved in polynomial 
time. 













Table 2: For these experiments the size of the ensemble is N = 100, the adjacent model is used and dynamic 
programming is used to guarantee convergence to the global optimum. When the results from the ensemble 
using only the best 20 outputs are statistically different from the two other distributions (as indicated 
by the Wilcoxon signed rank test), a symbol “s” is shown. Again, 625 represents perfect generalization. 
Generalization increases as N and K increase. 


K 

Evaluation 

Best Single Output 
mean ± std 

Best Singh 
mean ± std 

Output 

Best-of-100 

Generalization Test 
All 100 

mean ± std best 

The “Top 20” 
mean ± std Best-of-100 

2 

0.13 ±0.01 

239 ±137 

525 

324 ±115 

525 

451 ±79 (s) 

525 

3 

0.13 ±0.01 

278 ±141 

566 

396 ±109 

525 

478 ±56 (s) 

586 

4 

0.13 ±0.01 

238 ±151 

574 

437 ±84 

525 

490 ±48 (s) 

575 

5 

0.14 ±0.01 

231 ±147 

533 

433 ±84 

525 

489 ±54 (s) 

599 

6 

0.14 ±0.01 

260 ±160 

625 

462 ±69 

525 

494 ±46 (s) 

599 

7 

0.15 ±0.01 

206 ±150 

550 

468 ±63 

575 

498 ±48 (s) 

612 

8 

0.16 ±0.02 

183 ±139 

545 

480 ±57 

525 

504 ±47 (s) 

614 


In terms of performance, any Evaluation above 0.10 means 
that the poles are being balanced for t = 1000 steps. In this 
sense, learning was successful for all values of K. The mean 
evaluation of the output neurons increased as K increased. 
This means that the pole and cart have less change in po¬ 
sition and velocity over time, and thus, the system is more 
stable. It is also clear that “Generalization” improves as K 
is increased. Recall that generalization tests are executed 
over 625 start states. When N = 20, the generalization re¬ 
sults ranges from a mean of 229/625 when I\ = 2 to 389/625 
when K = 8 for the Adjacent NK Landscape model. A score 
of 625 would mean that the system was able to control the 
system from every possible start state in the generalization 
test suite; results in Table 1 show this is possible (K=6, Best 
Single Output, Best-of-100). 

The variance for the generalization is high, ranging from a 
standard deviation of 110 to 160 for the Adjacent NK Land¬ 
scape model. The results are very similar for the Random 
NK Landscape model. This high variance means that some 
NK Echo State Networks are very much above average and 
doing an extremely good job at generalization, while other 
NK Echo State Networks are very much below average. The 
column marked “Best-of-100” shows that the best networks 
can achieve very good generalization: a few of the “Best- 
of-100” NK Echo State Networks have generalization results 
above 600. 

4.2 Using a Larger Ensemble 

We next asked what happens when N is increased to N = 
100. These results are shown in Table [2] only the Adjacent 
NK Landscape model is used. All of the runs were again 
successful at balancing the two poles. The difference lies in 
the generalization. With N = 100, K > 3 we now see the 
average generalization improve to over 400 out of the 625 
start states; in addition, the standard deviation reduces to 
below 85. 

As has already been noted, generalization improves as ei¬ 
ther K or N increases. This of course has a cost, since 
constructing the NK-Landscape has cost of N'2 K+1 . The 
results labeled “All 100” in Table [2] includes all 100 outputs 
in the ensemble. 

We also considered one more strategy to improve genera¬ 
tion. This strategy included no additional evaluations. After 
the probe filter has been optimized by turning on and off 
neurons, we checked the behavior of the 100 output nodes 
in the ensemble on the evaluation function, and selected the 
“Top 20” best outputs to create the ensemble. 


These results are also shown in Table [2] and are labeled 
the “Top 20.” As the results show, this again increases gen¬ 
eralization at virtually zero additional cost. 

Table 3: Evaluation Results comparing the Adjacent 
Neighborhood NK-Landscape with different values 
of N and K. The results are also compared to other 
results in the literature. All of the 2008 results are 
from Gomez, et al. |4|; these results sometimes in¬ 
clude improvements over other earlier published re¬ 
sults for the same methods. The “Top 20” outputs 
were used to create the ensemble when N=100. 


Algorithm 

Number of 
Feedward Steps 

Generalization 

CE 1996 [|] 

840,000 

300 

ESP 199913| 

169,000 

289 

ESP 2008 m 

26,342 

Not Reported 

NEAT 2002 1T7 

33,184 

286 

NEAT 2008 Tfl 

6,929 

Not Reported 

CoSyNE 20081?] 

3,416 

Not Reported 

N—20, K—3 

320 

304 

N—20, K—5 

1,280 

377 

N=100, K—2, Top 20 

800 

450 

N=100, K—3, Top 20 

1,600 

487 

N=100, K—4, Top 20 

3,200 

490 


4.3 Comparative Results 

Recent papers have focused more on the number of evalu¬ 
ations needed to achieve “successful learning” and less on the 
generalization of the neural network. However, in the end it 
is generalization that is important if the learned solution is 
going to be useful. In Table [3] various results are reported 
from a number of papers published over the last 18 years 
for the double pole balancing problem when no velocity in¬ 
formation is provided as input. One can observe a dramatic 
improvement in the ESP algorithm between the years 1999 
and 2008. Likewise NEAT shows considerable improvement 
between 2002 and 2008. 

Methods such as ESP and CoSyNE depend on having a 
predefined and fixed architecture; almost a decade of expe¬ 
rience solving the double pole balancing problem with no 
velocities surely makes it easy to select a suitable compact 
neural network architecture such that the problem is reduced 
to just learning the weights. But this also exploits more hu¬ 
man experience in configuring the neural network. 





















Here, we report results for the NK Echo State Network 
for configurations that keep the number of evaluations un¬ 
der 4000. Using just 320 evaluations, the NK Echo State 
Network with IV = 20 and K = 3 yields an average gener¬ 
alization of 304 successes from the 625 possible start states. 
This level of generalization is similar to results reported in 
the earlier literature. But the number of evaluations is very 
low. Better generalization was achieved by setting N = 100 
and K = 3 and then selecting the “Top 20” outputs to be 
included in the ensemble. This configuration used 1600 eval¬ 
uations, but the NK Echo State Network was able to suc¬ 
cessful balance the double pole from 478 of the 625 start 
states on average. If generalization is important, the best 
networks can be selected after multiple runs. 

Finally, it should be noted that the NK Echo State Net¬ 
work does not use an architecture that is specific to this 
problem. Theoretically, exactly the same architecture could 
be used to make stock market predictions, or any other 
learning application where an Echo State Network might be 
used. In the sense, the same reservoir can be reused to solve 
very different machine learning problems by reconfiguring 
the probe filter layer. 

5. CONCLUSIONS 

The problem of training an enhanced Echo State Network 
has been converted into the problem of optimizing a spin 
glass system in the form of NI< Landscape. Learning is ac¬ 
complished by turning on and off neurons in a probe filter 
layer. The probe filter is a layer of hidden units that is in¬ 
serted between the reservoir of the Echo State Network and 
the output neurons. The neurons in the probe filter layer 
are not recurrent, but are connected to recurrent neurons in 
the reservoir. 

A well known reinforcement learning benchmark was in¬ 
vestigated that requires a single output neuron was used to 
test the NK Echo State network. An additional enhance¬ 
ment of the Echo State Network was the use of an ensemble 
of N outputs, all of which attempt to learn the same decision 
function. The ensemble is combined to yield a single output. 

Our empirical results show that the resulting “NK Echo 
State Network” is able to learn the control task of balanc¬ 
ing two pole on a fixed track with no velocity information. 
Learning was faster than other results that have been re¬ 
ported in the literature. Furthermore, generalization im¬ 
proved as N and K were increased. 

The approach of using “neuron selection” to learn instead 
of adjusting weight vectors represents a novel approach to 
training neural networks. With the current interest in “Deep 
Learning,” this methods may find additional applications in 
training other types of complex neural networks. 
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